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Abstract 


Multilinear  Formula  Score  theory  provides  powerful  methods  for  solvini 
psychological  measurement  problems  of  long  standing.  In  this  papei  the 
question  of  information  in  incorrect  option  selection  on  multiple  choice 
items  is  addressed.  Multil inear  formula  scoring  (MFS)  i-.  first  used  to 
estimate  option  characteristic  curves  for  the  Armed  Services  Vocational 
Aptitude  Battery  Arithmetic  Reasoning  test.  Accurately  estimated  curves  are 
obtained  for  real  and  simulated  data.  Then  the  statistical  information 
about  ability  is  computed  for  dichotomous  and  poiychotomous  scorings  of  the 
items.  Moderate  gains  in  information  are  obtained  for  low  to  slightly  above 
average  abilities.  The  dichotomous  and  poiychotomous  models  are  then 
compared  for  their  relative  performances  in  appropriateness  measurement. 

The  rates  of  detection  of  some  types  of  aberrance  respond ing  were  more  than 
1001  higher  for  optimal  poiychotomous  appropriateness  indices  than  any 
dichotomous  model  index.  Consequent  1 y  the  MFS  poiychotomous  model  provide  ; 
opportunities  for  better  testing  by  allowing  more  accurate  ability 
estimates,  improvements  in  the  theory  and  practice  of  item  writing,  and  more 
powerful  appropriateness  measurement. 
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Multilinear  foraula  score  theory  or  aultllinear  foraula  scoring  (HFS; 
Levine,  1985a,  1985b)  is  a  nonparaaetric  ltea  response  theory  for  which 
consistent  and  asyaptotically  efficient  estlaators  of  ability  densities, 
ltea  characteristic  curves  (ICCs),  and  option  characteristic  curves  (OCCs) 
have  been  derived  and  prograaaed.  MFS  provides  a  powerful  new  approach  to 
substantive  questions  of  long  standing.  These  questions  include  determining 
the  shapes  of  ability  distributions  and  the  aagnitudes  of  differences 
between  ability  distributions  of  various  groups,  deteralning  the  shapes  of 
item  characteristic  curves  for  unidiaenslonal  and  aultidlaenalonal  tests, 
identifying  biased  and  other  faulty  lteas,  and  assessing  the  extent  to  which 
two  tests  measure  the  same  ability. 

In  this  paper  we  focus  on  NFS's  ability  to  estimate  efficiently  option 
response  curves  from  small  samples  for  responses  that  may  Tail  to  satisfy 
the  local  Independence  assumption  of  item  response  theory.  The  benefits  of 
this  endeavor  shall  be  assessed  in  two  ways.  First,  we  determine  the 
Increase  in  information  about  ability  due  to  polychotomous  scoring  of  item 
responses.  Here  the  term  "information"  is  used  in  its  statistical  sense  to 
mean  the  expected  squared  derivative  of  the  logarithm  of  the  likelihood 
function.  Since  the  asymptotic  standard  error  of  the  maximum  likelihood 
estimate  of  an  ability  6  equals  the  square  root  of  the  reciprocal  of  the 
information  function  at  0  ,  an  increase  in  information  due  to  polychotomous 
scoring  is  readily  translated  into  percent  test  length  reduction  made 
possible  by  polychotomous  scoring. 

The  second  comparison  is  between  the  dichotomous  and  polychotomous  item 
response  model's  potentials  for  supporting  appropriateness  measurement. 
Levine  and  Rubin  (1979)  Introduced  this  term  to  refer  to  model-based  methods 
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for  detecting  response  patterns  that  yield  faulty  measures  of  ability.  For 
example,  test  scores  are  spuriously  high  when  a  low  ability  examinee  copies 
some  answers  from  a  high  ability  neighbor  or  has  been  given  answers  to  some 
questions  by  an  Informant.  Spuriously  iow  test  scores  result  from  alignment 
errors,  atypical  educations,  unusual  creativity,  deliberate  failure,  and  a 
variety  of  other  sources. 

Of  course,  the  model-based  detectability  of  a  particular  type  of 
aberrance  depends  upon  the  item  response  model  used  to  analyze  the  data; 
more  specific  (polychotomous)  models  are  expected  to  be  rejected  more 
frequently  when  fitted  to  aberrant  response  patterns  and  thus  provide 
superior  appropriateness  measurement.  Recently  Levine  and  Drasgow  (1984, 

1987)  developed  a  technique  for  computing  the  power  or  the  most  powerful 
appropriateness  measurement  procedure  supported  by  an  item  response  model. 

By  combining  the  new  optimality  results  with  MFS’s  ability  to  accurately 
recover  the  option  characteristic  curves  needed  for  polychotomous  modeling 
we  Intend  to  determine  whether  polychotomous  modeling  is  negligibly  or 
markedly  superior  to  dichotomous  modeling  in  detecting  test  anomalies. 

This  study  also  contributes  to  formula  score  theory  in  that  it  contains 
a  verification  of  MFS  theoretical  results  with  simulation  data. 

Review  of  Multilinear  Formula  Score  Theory 

In  this  section  we  review  MFS  theory  as  it  is  used  in  this  paper.  The 
theory  is  more  general  than  outlined  here,  but  for  the  sake  of  clarity  we 
describe  only  the  special  case  required  for  tne  present  research. 

Let  u  denote  the  response  to  the  _Kh  item  of  an  n  item  test 
scored  u^  <*  1  if  correct  and  u.  *  0  if  incorrect.  The  u.  generate  the 
elementary  formula  scores,  which  can  be  enumerated  is 
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V  U2 . un 

U1U2*  u1u3 . VlUn 

•  •  • 

u, u_  ...  u„  . 
i  i  n 

Traditional  formula  scoring  (Lord  and  Novick,  1968,  especially  Chapter 
14)  generally  uses  only  linear  scores.  When  there  is  neither  omitting  nor 
polychotomous  scoring,  linear  formula  scores  are  formulas  with  a  constant 

term  plus  a  linear  combination  of  the  binary  item  scores,  u1 ,  u^ . un  . 

(When  there  is  omitting  and  polychotomous  scoring,  a  linear  score  is  a 
constant  plus  a  linear  combination  of  binary  variables  indicating  omitting 
and  option  choice. ) 

Multilinear  formula  score  theory  generalizes  traditional  formula  score 
theory  by  using  quadratic  scores  (linear  scores  added  to  linear  combinations 

of  u.| u^,  u^ . un-1Un  ^ '  cubic  8Cor®®  (quadratic  scores  plus  linear 

combinations  of  products  of  item  scores  for  three  different  items),  and 
higher  order  scores.  Most  of  the  results  in  this  paper  were  obtained  with 
fifth  order  scores.  The  new  theory  is  called  "multilinear"  because  frequent 
use  is  made  of  the  fact  that  when  all  the  scores  except  one  are  held 
constant,  a  "linear"  score  is  obtained. 

In  this  paper  we  shall  assume  that  the  regression  of  u^  on  the  latent 
trait  0  is  a  three-parameter  logistic  ogive 

1  “  ci 

£(U1  1  9  “  t)  ’  C1  ♦  1  ♦  expt-Da^t  -  b^)] 

-  Pt(t)  . 
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where  D  is  a  scaling  constant  set  equal  to  1.702,  a^  is  the 
discrimination  parameter,  is  the  difficulty  parameter,  and  ci  is  the 

lower  asymptote  of  the  ICC.  By  local  independence,  the  regressions  of  the 
elementary  formula  scores  on  the  latent  trait  can  then  be  written 

1 

P^t),  P2(t) . Pn(t) 

P1(t)P2(t),  P1(t)P3(t) . Pn-1(t)Pn(t) 

•  •  • 

P.(t)P0(t)  ...  P  (t)  , 

12  n 

where  each  P^it)  is  a  three-parameter  logistic  ICC. 

There  are  2n  regression  functions  listed  above.  More  can  be 
generated  by  taking  linear  combinations  of  the  elementary  formula  scores  and 
then  computing  their  regressions  on  the  latent  trait.  For  example,  the 
number-right  score 

X  -  u  ♦  u_  ♦  . . .  *  u  , 

12  n 

has  the  regression 


n 

E(X  |  t)  -  I  P  (t)  . 
i-1 

The  collection  of  regression  functions  of  all  linear  combinations  of 
elementary  formula  scores  is  called  the  canonical  space  of  a  test. 

A  major  step  in  a  MFS  analysis  of  a  test  consists  of  finding  a  smaller 
number  of  functions  to  represent  the  large  number  (in  fact,  an  infinite 
number)  of  functions  in  the  canonical  space.  The  smaller  collection  of 
functions  is  called  an  orthonormal  basis  for  the  canonical  space. 
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Selecting  an  orthonormal  basis  for  the  canonical  space  is  analogous  to 
finding  the  principal  components  of  a  set  of  variables.  In  a  principal 
components  analysis,  the  basi^  idea  is  to  create  a  new  set  of  variables,  the 
principal  components,  so  that  each  of  the  original  variables  can  be  written 
as  a  linear  combination  of  the  principal  components  plus  a  small  residual. 

A  principal  components  analysis  is  valuable  when  there  is  a  large  number  of 
original  variables  and  the  first  few  principal  components  explain  almost  all 
of  their  variance.  In  the  same  way  functions  in  the  canonical  space  are 
written  as  linear  combinations  of  the  orthonormal  basis  functions.  For 
example,  the  ICC  for  the  ith  item  can  be  written 

K 

P  (t)  =  I  a.  h  (t)  , 

1  k=1 

where  K  functions,  denoted  h^t),  ....  hK(t)  are  used  in  the  orthonormal 
basis  and  the  ak  are  the  weights  used  in  the  linear  combination.  If  K 
is  sufficiently  large,  this  representation  is  exact.  If  only  the  first  J 
functions  are  used,  instead  of  all  K  functions  (where  J  is  les3 
than  K  ),  then  there  is  some  error.  However,  the  residual 

J  K 

P  (t)  -  l  a  h  (t)  -  l  a.h.(t) 

k-1  J+1  K  K 

will  be  small  if  the  are  small  for  values  of  k  larger  than  J  .  In 

2  2 

fact,  the  area  under  the  squared  residual  is  exactly  a.  ,  ♦  a.  _  +  ...  ♦ 


In  each  MFS  analysis  a  parsimonious  representation  of  one  or  another 
collection  of  functions  in  the  CS  is  important.  MFS  provides  techniques 
that  yield  basis  functions  that  give  small  for  large  k  ,  at  least  for 

the  collection  of  functions  being  analyzed.  Most  MFS  analyses  require  six 
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to  eight  basis  functions  for  an  adequate  representation  of  the  functions 
being  studied.  Ten  were  used  in  this  study. 

To  recapitulate,  the  analysis  begins  by  estimating  ICCs  from  the 
dichotomously  scored  item  responses.  Widely  available  programs  such  as 
LOGIST  and  BILOG  can  be  used  to  this  end.  The  estimated  ICCs  (and  the 
assumption  of  local  independence)  are  subsequently  used  to  define  the 
canonical  space.  Then  a  small  number  of  orthonormal  basis  functions  are 
selected  so  that  the  functions  in  the  canonical  space  are  well-approximated 
by  linear  combinations  of  the  orthonormal  basis  functions. 

The  next  step  of  the  MFS  analysis  is  to  use  the  orthonormal  basis 
functions  to  represent  the  option  characteristic  curves  (OCCs).  For 
technical  reasons  (see  below),  we  first  estimate  orthonormal  basis  function 
weights  for  conditional  option  characteristic  curves  (COCCs).  A  COCC  gives 
the  probability  of  an  option  choice  given  that  the  person  does  not  choose 
the  correct  option.  A  COCC  equals  its  associated  OCC  divided  by  (1-P^e)). 
Hence  the  COCCs  for  an  item  sum  to  1  for  all  0  values  whereas  the  OCCs 
sum  to  1-P^e) ,  which  becomes  very  small  for  large  0  values.  Each  option 
characteristic  curve  i3  then  represented  as  the  product  of  two  linear 
combinations  of  the  h^'s  »  namely  the  representation  of  1-P^  and  a  COCC. 

At  this  point  the  OCC  can  be  represented  by  a  single  set  of  weights  by 
calculating  weights  b’s  such  that  Ib.h,(*)  is  approximately  equal  to 

J  J  J 

( 1  — P^ )  times  the  COCC  value.  (An  exact  representation  is  not  possible  in 
general  because  a  product  of  two  functions  in  the  canonical  space  is  not 
necessarily  in  the  canonical  space.) 

Since  OCCs  and  COCCs  were  not  included  in  the  set  of  functions  used  to 
define  the  canonical  space,  there  is  both  the  mathematical  question  of  how 
best  to  approximate  the  OCCs  and  COCCs  by  basis  functions  and  the 
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substantive  question  of  whether  or  not  the  basis  functions  can  adequately 
approximate  OCCs  and  COCCs.  The  analysis  proceeds  item-by-item  with  the 
weights  for  all  the  options  (including  omit  as  an  option)  to  each  item 
simultaneously  estimated  by  "marginal"  maximum  likelihood.  The  log 
likelihood  that  is  maximized  with  respect  to  the  weights  is 


N 

(1)  L  -  I  log  P(u»  v*)  , 

j-1  J  J 


where  u*  is  a  vector  containing  the  dichotomously  scored  item  responses 

J 

the  jth  examinee  and  v*^  indicates  the  particular  option  on  item  i 
selected  by  examinee  j  .  For  a  four  option  multiple-choice  item,  v*^  - 
if  option  A  is  selected,  ...  v*j  -  4  if  option  D  is  selected,  and 
v*  *  5  if  no  response  is  made.  Suppose  all  the  items  are  recoded  so 

that  option  A  is  always  the  correct  response.  Then  Equation  1  can  be 
rewritten  as 


of 


1 


(2) 


where 


(3) 

(4) 


N 

I  log  P(u*)  ♦ 
j-1  J 

v*  *>  1 

ij 


N 

I  log  J  P(u* 
j-1  J 

V*  «1 
ij 


t)  P(v«j  I  t,  UiJ-0)f(t)dt 


n  u 


P(u*  |  t)  -  n  P  (t)  iJ[1  -  P  (t)] 
J  i-1 


1-u 


ij 


P(v*j  |  t,  utJ  -  0)  -  E  okhk(t)  , 
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and  f(t)  is  the  ability  density.  Notice  that  Equation  3  is  the  likelihood 

function  for  the  three-parameter  logistic  model  (i.e.,  Lord's  (1980) 

Equation  (4-20)  and  Hulin,  Drasgow,  i  Parson's  (1983)  Equation  (2.6.2)).  It 

is  the  a  ' s  in  Equation  4  that  are  to  be  estimated.  Actually,  each  option 

has  its  own  set  of  J  a  ' s  ,  but  to  avoid  rotational  complexity  we  have  not 

added  another  subscript  to  the  a,  's  . 

k 

It  is  important  to  observe  that  local  independence  is  not  used  to 
derive  Equation  2  from  Equation  1;  only  the  definition  of  conditional 
probability  is  used.  Tnus,  even  when  skipping  items  or  not  teaching  items 
(response  "V)  fails  to  obey  the  assumption  of  local  independence,  an 
accurate  estimate  of  the  conditional  probability  of  non-response  for 
examinees  at  each  ability  level  may  be  obtained. 

Quadratic  programming  is  used  to  obtain  maximum  likelihood  estimates  of 
orthonormal  basis  function  weights  for  the  C0CC3  in  Equation  4.  The  weights 
a  for  the  COCCs  are  easier  to  estimate  than  the  weights  for  OCCs  since  the 
jCCs  for  easy  items  and  OCCs  for  rarely  chosen  options  are  close  to  zero, 
which  causes  the  a  to  become  indeterminate;  COCCs  are  not  usually  close 
to  zero.  Cl  nee  the  OCf  at  9  «  t  is  equal  to  the  COCC  times  1  -  P^lt)  , 
the  OCCs  are  available  after  the  CuOCs  have  been  obtained.  The  COCCs  are 
intrinsically  interesting  as  well  as  mathematically  tractable  since  their 
shapes  can  be  used  to  study  the  properties  of  effective  distractors. 

The  quadratic  programming  methods  used  by  Levine  and  Williams  (1987) 
are  convenient  because  they  allow  plau3ibl«  constraints  to  be  placed  on  the 
COCCs.  One  constraint  is  posit i v 1 ty :  CuCCs  are  not  allowed  to  become 
negative.  In  our  analyses  all  COCCs  were  required  to  equal  or  exceed  .001. 

A  second  constraint  placed  on  COCCs  is  smoothness ;  The  COCCs  were  not 
allowed  to  oscillate  widely.  The  smoothness  constraint  can  be  implemented 
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by  restricting  the  third  derivative  of  the  COCCa  to  be  less  than  .00*5.  Tnis 
condition  can  be  thought  of  as  requiring  each  small  piece  of  tne  graph  of 
the  COCC  to  have  a  very  accurate  quadratic  approximation.  (A  restriction  :>r. 
the  second  derivative  would  force  the  COCC  to  be  locally  linear  and  a  first 
derivative  constraint  would  force  the  COCC  to  be  locally  constant. ) 

In  summary,  orthogonal  basis  functions  hj(t)  are  derived  fro*  KCs, 
which  are  estimated  by  programs  such  as  LOGIST  or  BILOG.  COCCs  are 
represented  as  linear  combinations  of  the  basis  functions  in  Eq.  4,  and 
marginal  maximum  likelihood  estimates  of  the  weights  in  this  equation 

are  obtained.  OCC  values  can  then  be  obtained  by  multiplying  COCC  values 
times  (1-Pi)  . 


Estimation  and  Information 

Data  set.  The  data  set  used  in  our  analyses  was  a  spaced  sample  of 
2978  examinees;  this  data  set  is  fully  described  in  the  Profile  of  American 
Touth  (1982).  These  examinees  answered  the  30  item  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB)  Arithmetic  Reasoning  (AR)  test.  Each 
item  on  this  test  has  four  options. 

ICC  estimation.  The  first  step  in  the  MFS  analysis  is  to  estimate 
ICCs  from  the  dichotomously  scored  item  responses.  To  this  end,  the  item 
responses  of  the  examinees  described  above  were  scored  dichotomously .  Ai. 
unanswered  Items  were  scored  as  incorrect  (since  skipping  ana  not  *-*ioh:ng 
are  treated  as  a  separate — and  incorrect-response  option).  Tnen  the  .*0’.  : 
(version  2B)  computer  program  (Wood,  Wingersky,  &  Lord,  was  useo  t. 

estimate  item  and  person  parameters.  Estimates  of  item  discrimination 
parameters  ranged  from  about  0.5  to  2.0  and  estimates  of  item  difficulties 


varied  from  about  -3.0  to  1.4  (mean  -  .14,  sd  -  .99). 
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for  all  11,91*4  examinees  In  the  American  Youth  data  set,  forming  25  ability 
strata  on  the  basis  of  estimated  abilities  by  using  the  *4th,  8th,  ...,  96th 
percentile  points  of  the  standard  normal  distribution  as  cutting  scores,  and 
tnen  computing  the  proportion  of  examinees  selecting  each  option  among  the 
subset  of  examinees  who  answered  the  item  incorrectly.  The  centers  of  the 
vertical  lines  correspond  to  the  observed  proportions  and  they  are  plotted 
aoove  tne  category  medians  (the  2nd,  6th,  ....  98th  percentile  points  of  the 
standard  normal  distribution) .  The  vertical  lines  represent  approximate  9 5% 
confidence  intervals  for  the  observed  proportions  (±  two  standard  errors, 
where  the  observed  proportion  is  used  to  compute  the  standard  error). 

Observed  proportions  of  0  and  1  are  plotted  as  plus  signs  and  are  offset 
slightly  from  their  true  locations  so  that  they  will  be  visible. 

The  AR  items  seem  to  be  more-or-less  ordered  by  difficulty. 

Consequently,  the  95t  confidence  intervals  for  the  first  few  items  in 
Appendix  1  are  very  wide  because  these  items  are  easy  and  so  few  examinees 
choose  incorrect  options.  Confidence  intervals  for  later  items  are  much 
narrower  and  provide  a  severe  test  for  COCC  estimates.  Item  27,  for 
example,  shows  that  the  COCC  estimates  provide  a  very  good  description  of 
option  choice.  Notice  that  the  COCC  for  the  omit  category  lies  below  most 
observer  proportions.  This  occurs  because  examinees  with  high  omitting 
rates  were  excluded  from  the  sample  used  to  estimate  COCCs,  but  were 
included  in  the  total  sample  used  to  compute  the  proportions  displayed  in 
Appendix  T . 

COCC  estimation  verification.  The  figures  presented  in  Appendix  1  show 
that  IFo  estimates  of  COCCs  closely  follow  the  actual  patterns  of  item 
responses.  It  is  difficult,  however,  to  understand  the  accuracy  of  COCC 
estimates  from  these  figures  because  the  true  COCCs  are  not  known.  To  gain 
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further  insights  into  the  properties  of  MFS  estimates  of  COCCs,  a  simulation 
data  set  of  3000  response  patterns  was  generated.  Simulated  abilities  were 
sampled  from  the  standard  normal  distribution,  probabilities  of  correct  and 
incorrect  responses  were  determined  from  the  ICCs  obtained  by  the  LOGIST  run 
described  previously,  and  probabilities  of  option  selections  (for  responses 
simulated  to  be  incorrect)  were  computed  using  the  MFS  estimated  COCCs. 

Thus,  the  assumptions  used  to  estimate  COCCs  correspond  exactly  to  the  way 
in  which  the  data  set  was  generated. 

COCCs  were  re-estimated  from  the  simulation  data  set.  The  true  ability 
density  (the  standard  normal)  was  used  in  Equation  2  and  the  true  ICC  values 
were  used  to  compute  probabilities  of  correct  and  Incorrect  responses.  The 
true  ability  density  and  ICC  values  were  used  because  we  wanted  to  determine 
the  errors  of  COCC  estimates  in  a  way  that  was  not  confounded  with 
inaccuracies  in  density  estimates  and  ICC  estimates. 

The  results  of  the  simulation  study  are  shown  in  Appendix  2,  which 
presents  the  re-estimated  COCCs  for  all  30  items.  Heavy  lines  indicate  the 
re-estimated  COCCs  and  thin  lines  indicate  the  true  COCCs.  Observed 
proportions  and  their  approximate  95%  confidence  intervals  are  shown  for  the 
simulation  sample  of  N  *  3000.  The  observed  proportions  are  not  plotted  if 
five  or  fewer  incorrect  responses  were  made  in  an  ability  stratum. 

Item  2  shows  estimated  COCCs  that  are  very  close  to  the  true  COCCs  for 
all  ability  levels.  This  is  remarkable  because  there  were  almost  no 
Incorrect  responses  made  by  simulated  examinees  with  above  average  ability. 
Item  3  shows  that  we  cannot  always  expect  to  have  well-estimated  COCCs  when 
there  are  no  data  available:  Large  differences  between  true  and  estimated 
COCCs  occur  at  high  ability  levels.  The  COCCs  are,  however,  accurately 


MFS  Theory 


15 

estimated  in  abiiity  ranges  for  which  there  were  very  few  incorrect 
responses. 

From  an  inspection  of  the  plots  in  Appendix  2  it  seems  evident  that 
COCC  values  are  accurately  estimated  when  there  are  six  or  more  incorrect 
responses  in  adjacent  ability  strata.  Sometimes  COCC  values  are  well- 
estimated  when  fewer  incorrect  responses  are  available,  but  this  seems  to  be 
a  matter  of  chance.  Notice,  also,  that  COCCs  for  the  omit  option  are  not 
underestimated  in  this  analysis  as  they  were  in  the  analysis  of  the  real  AR 
data.  In  the  present  nalysls,  all  response  vectors  were  used;  there  was  no 
restriction  on  omitting  as  in  the  previous  analysis.  In  this  simulation 
study  data  were  unidimensional  in  the  sense  that  the  probability  of  omitting 
depended  only  on  ability,  although  it  was  permitted  to  vary  from  item  to 
item.  It  would  have  been  more  realistic  to  use  a  two  dimensional  simulation 
model  with  examinees  varying  both  in  ability  and  tendency  to  omit. 

Information  function.  Information  functions  for  the  dichotomous  and 
polychotomous  modelings  of  the  AR  test  are  shown  in  Figure  2.  An  expression 
for  the  information  function  of  the  three-parameter  logistic  model  is 


(5) 
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.  The  Information  function  of  the  polychotomous  model  is 
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for  both  the  dichotomous  and  pol  yenotomous  scorings,  namely,  the  first  term 
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on  the  right  sides  of  Equations  5  and  6.  Thus,  any  differences  in 
information  are  entirely  due  to  the  treatment  of  incorrect  responses.  Using 
Jensen's  inequality  (Halaos,  1950)  it  can  be  shown  that 

J  [PjjU)]2  4  [Q'(t)]2 

j-2  Pij(t)  tjth 

(cf.,  Saoejlna,  1969;  Park,  1983).  Thus,  any  increase  in  information  is 
entirely  due  to  polychotomous  scoring. 


Insert  Figure  2  about  here 


Figure  2  shows  that  there  are  moderate  gains  in  information  due  to 
polychotomous  scoring  of  the  AR  items  for  low  to  moderately  high  abilities. 
Little  or  no  Information  is  gained  for  high  ability  examinees;  this  latter 
finding  is  not  surprising  because  high  ability  examinees  are  expected  to 
answer  nearly  all  the  items  correctly. 

It  should  be  noted  that  the  AR  items  were  not  written  with 
polychotomous  scoring  in  mind  and  so  the  gains  in  information  shown  in 
Figure  2  are  more-or-less  accidental.  Larger  gains  might  be  realized  if 
item  writers  knew  the  attributes  of  incorrect  options  that  typically  lead  to 
substantial  increases  in  Information. 


i 
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Appropriateness  Measurement 


Purpose 

In  this  section  we  compare  the  effectivenesses  of  dichotomous  and 
polychotomous  models  for  detecting  aberrant  responses  patterns.  By 
comparing  detection  rates  of  optimal  indices  it  is  possible  to  compare  the 
maximum  detection  rates  possible  for  a  given  form  of  aberrance.  In  this 
section,  as  in  the  previous  section,  the  dichotomous  model  is  a  submodel  of 
the  polychotomous  model;  hence  any  increase  in  detection  rates  is  due  to 
modeling  incorrect  responses. 

For  an  optimal  index  to  be  truly  optimal,  it  must  be  computed  from  the 
true  ICCs  or  OCCs  and,  therefore,  the  optimal  indices  for  dichotomous  and 
polychotomous  scorings  of  the  simulation  data  were  computed  using  the 
simulation  ICCs  and  OCCs.  In  any  practical  application,  however,  only 
estimated  ICCs  and  OCCs  will  be  available.  Consequently,  we  decided  to 
examine  one  aspect  of  the  robustness  of  optimal  indices  by  computing  the 
optimal  index  for  dichotomously  scored  responses  using  ICCs  estimated  by  the 
LOGIST  (Hood,  Wlngersky,  &  Lord,  1976)  computer  program.  Further  research 
designed  to  develop  extensions  of  optimal  indices  for  use  in  practical 
settings  will  be  warranted  if  the  optimal  indices  computed  from  estimated 
ICCs  are  found  to  be  nearly  as  powerful  as  optimal  indices  computed  from  the 
true  ICCs. 

Several  practical  Indices  were  also  evaluated.  Most  of  these  indices 
were  computed  from  the  dichotomously  scored  item  responses.  One  index, 
however,  is  the  natural  extension  of  a  dichotomous  model  index  to  the 
polychotomous  case.  Detection  rates  for  the  practical  indices  indicated  (1) 
which  were  relatively  more  powerful  and  less  powerful;  and  (2)  the  extent  to 


which  the  maximum  detection  rates  were  attained. 
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Overview 

The  ICCs  and  OCCs  estimated  for  the  AR  test  from  the  sample  of  N  ■ 

2,978  were  used  as  the  "true"  item  parameters  in  a  simulation  study. 
Initially,  a  sample  of  N  -  3000  simulated  response  patterns  was  created  and 
used  as  a  test  norming  sample.  This  data  set  was  used  to  determine  the  item 
and  test  statistics  required  to  compute  all  but  one  (Zp)  of  the  practical 
appropriateness  indices  listed  in  the  next  section.  Then  a  normal  sample  of 
N  -  4000  responses  vectors  was  created.  In  addition,  sixteen  aberrant 
samples  of  N  -  2000  were  generated  to  simulate  several  forms  of  aberrance. 
Optimal  indices  and  all  the  practical  indices  were  then  computed  for  the 
normal  sample  and  aberrant  samples.  Rates  of  detection  of  aberrant 
responses  vectors  at  various  false  alarm  rates  were  determined  for  each 
appropriateness  index  and  each  form  of  aberrance. 

Appropriateness  Indices 

In  this  section  we  list  the  appropriateness  indices  that  are  evaluated. 
For  the  sake  of  brevity  we  shall  not  provide  extensive  technical  detail. 

This  information  is  given  by  Levine  and  Drasgow  (1984;  1987)  for  optimal 
indices  and  by  Drasgow,  Levine,  and  McLaughlin  (1987)  for  practical  indices. 
Additional  references  are  given  when  appropriate. 

Polychotomous  model  optimal  indices  (LR^).  Levine  and  Drasgow  (1984) 
used  the  Neyman-Pearson  lemma  to  derive'  a  class  of  most  powerful 
appropriateness  indices.  These  indices  require  the  probabilities  of 
observing  the  polychotomously  scored  response  vector  v*  assuming  that  it 
was  generated  by  a  normal  process  (PNormai  and  assuming  that  it  was 

generated  by  a  specified  aberrant  process  ^Aberrant  •  The 

decision  procedure  that  classifies  response  vectors  as  aberrant  when 
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where  the  constant  is  chosen  to  control  the  false  alarm  rate  or  Type  I  error 
rate,  is  at  least  as  powerful  as  any  other  test.  Thus,  the  polychotomous 
model  optimal  indices  studied  here  have  the  form 


LR 


P,k  .  (v*)  /  P„  ,  (v«) 
Aberrant  Normal 


where  the  probabilities  are  computed  using  three-parameter  logistic  ICCs  to 
determine  conditional  probabilities  of  correct  responses  and  MFS  OCCs  to 
determine  conditional  probabilities  of  incorrect  responses.  Technical 
details  about  the  form  of  LRp  for  specific  types  of  aberrance  and  an 
efficient  computing  algorithm  are  given  by  Levine  and  Drasgow  ( 1984 ;  1987). 

Dichotomous  model  optimal  indices  (LR,).  These  indices  are  identical 
to  the  LRp  indices  except  that  the  only  information  used  in  their 
calculation  is  the  pattern  of  correct  and  incorrect  responses  u*  ,  i.e., 
the  dichotomously  scored  item  responses.  This  class  of  indices,  therefore, 
provides  the  highest  rates  of  detection  when  the  choice  of  incorrect  option 
is  ignored. 

Dichotomous  model  optimal  indices  computed  using  estimated  item 
parameters  (LRJ).  For  optimal  indices  to  be  truly  optimal  they  must  be 
computed  using  item  parameters  —  not  item  parameter  estimates.  In  previous 
work  (Levine  &  Drasgow,  1982),  we  found  that  the  values  of  some 
appropriateness  indices  were  almost  unaffected  when  item  parameter  estimates 
were  used  in  place  of  item  parameters.  In  the  present  research,  optimal 
indices  for  the  three-parameter  logistic  model  were  also  computed  using 
estimated  item  parameters. 

Dichotomous  and  polychotomous  model  standardized  Up  (z,  and  z^) . 

These  indices,  originally  developed  by  Drasgow,  Levine,  and  E.  Williams 
(1985),  are  well-standardized  (i.e.,  their  conditional  distributions  given 
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ability  are  nearly  invariant  across  ability  levels)  and  are,  therefore, 
well-suited  to  practical  applications.  In  essence,  they  compare  the 
likelihood  of  a  vector  of  item  responses  to  the  expected  likelihood  given 
the  examinee's  ability  estimate.  In  previous  research  (Levine  &  Rubin, 

1979;  Levine  &  Drasgow,  1982,;  Drasgow,  Levine,  &  E.  Williams,  1985),  it  has 
been  found  that  aberrant  response  vector  tend  to  have  likelihoods  that  are 
smaller  than  expected  of  normal  response  vectors,  and  thus,  the  standardized 
likelihoods  z,  and  z^  serve  as  effective  appropriateness  indices. 

Fit  statistic  (FI  and  F2).  Two  fit  statistics  suggested  by  Rudner 
(1983)  as  generalizations  of  Rasch  model  fit  statistics  used  by  Wright  and 
his  colleagues  are 

Fi  »  1  z  [u.  -  p.(e)]2  /  [P.(e)Q.(e)] 
n  i  i  i  1 

and 

<  n  .  a  n  ^  . 

F2  -  ~  I  [U  -  P.(e)]  /  £  p,(e)Q.(e)  . 

n  i-i  1  1  i-i  1  1 

Notice  that  FI  and  F2  tend  to  be  large  when  an  examinee  misses  items 

A 

(u^  -  0)  that  should  be  answered  correctly  (P^©)  near  1)  and  correctly 

A 

answers  (u^  -  1)  items  that  should  be  very  difficult  ( P ^ ( 0 )  near  0)  . 
Drasgow,  Levine,  and  McLaughlin  (1987)  found  F2  to  be  well-standardized. 

FI,  however,  was  badly  standardized  because  relatively  many  large  values 
were  observed  for  simulated  normal,  high  ability  examinees. 

Caution  indices  (S,  T2,  and  T4).  Three  "caution”  indices  were 
evaluated.  The  first  is  the  original  Sato  caution  index  S  described  by 

Sato  (1975)  and  Tatsuoka  and  Linn  (1983).  The  other  two  caution  indices  are 

the  second  (T2)  and  fourth  (T4)  standardized  extended  caution  indices 
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developed  by  Tatsuoka  (198*0.  Drasgow,  Levine,  and  McLaughlin  (1987)  found 
T*4  to  be  better  standardized  than  T2  ,  so  T*4  should  be  preferred  when 
their  detection  rates  are  comparable. 

Likelihood  function  curvature  statistics  (JK  and  0/E).  It  is  expected 
that  the  likelihood  function  will  be  "flatter"  for  aberrant  response  vectors 
than  normal  response  vectors  at  the  maximum  likelihood  ability  estimate  0  . 
Two  indices  that  provide  measures  of  the  flatness  of  the  likelihood  function 
were  evaluated.  The  first  (JK)  is  a  normalized  jackknife  estimate  of  the 

A 

variance  of  0  and  the  second  is  the  ratio  of  the  observed  and  expected 
information  about  ability  contained  in  the  dichotomously  scored  item 
responses.  Both  of  these  indices  are  described  by  Drasgow,  Levine,  and 
McLaughlin  (1987),  who  showed  that  JK  and  0/E  are  well  standardized. 

Method 

Data  Sets.  A  test  normlng  sample  of  3000  responses  vectors  wa3  created 
by  sampling  3000  numbers  (0's)  from  the  normal  (0,1)  distribution 
truncated  to  the  [-5.0,  3.5]  interval.  A  normal  sample  of  *4000  response 
vectors  was  also  generated  in  this  way.  Two  thousand  aberrant  response 
vectors  were  created  in  each  of  sixteen  conditions.  These  conditions 
resulted  from  varying  three  factors:  the  type  of  aberrance  (spuriously 
high;  spuriously  low),  the  severity  of  aberrance  (mild;  moderate),  and  the 
distribution  from  which  simulated  abilities  were  sampled. 

Eight  of  the  aberrant  samples  contained  spuriously  high  response 
vectors  and  the  remaining  eight  samples  contained  spuriously  low  responses 
vectors.  Spuriously  high  responses  patterns  were  created  by  first 
generating  normal  response  vectors  (using  the  AR  three-parameter  logistic 
ICCs  to  determine  the  probabilities  of  correct  responses  and  the  AR  COCCs  to 
determine  probabilities  of  incorrect  option  selection)  and  then  replacing  a 
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given  percentage  k  of  simulated  responses  (randomly  sampled  without 
replacement)  with  correct  responses.  Spuriously  low  response  patterns  were 
also  created  by  first  generating  normal  response  vectors.  Then  a  fixed 
percentage  of  items  were  randomly  selected  without  replacement  and  the 
responses  to  these  items  replaced  with  random  responses  (  i.e.,  a  response 
was  replaced  by  option  A  with  probability  .25,  by  option  B  with 

probability  .25 . and  by  option  D  with  probability  .25).  Mildly 

aberrant  response  patterns  were  generated  by  using  k  -  1751  (i.e.,  5  of  30 

items).  Moderately  aberrant  response  patterns  were  created  using  k  *  33% 
(i.e.,  10  of  30  items). 

The  third  variable  manipulated  was  the  ability  level  of  the  aberrant 
sample.  Abilities  for  the  spuriously  high  samples  were  sampled  from  four 
parts  of  the  normal  (0,1)  distribution  truncated  to  [-5.0,  3.5]:  very  low 
(0th  through  9th  percentiles),  low  (10th  through  30th  percentiles),  low 
average  (31st  through  48th  percentiles),  and  high  average  (49th  to  64th 
percentiles).  In  all  cases  percentile  points  were  determined  after  the 
truncation  to  [-5.0,  3.5].  These  intervals  were  used  because  it  is  more 
important  to  detect  spuriously  high  response  patterns  for  low  ability 
examinees  than  for  high  ability  examinees.  Similarly,  it  is  more  important 
to  detect  spuriously  low  responses  by  high  ability  examinees.  Consequently, 
abilities  were  sampled  from  four  average  to  high  ability  strata  for  the 
spuriously  low  samples:  very  high  (93rd  percentile  and  above),  high  (65th 
through  92nd  percentiles),  high  average  (49th  through  64th  percentiles),  and 
low  average  (31st  to  48th  percentiles).  The  ability  percentiles  used  here 
correspond  to  the  percentiles  forming  United  States  Armed  Service  Vocational 
Aptitude  Battery  mental  categories. 
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Analysis.  All  the  item  and  test  statistics  required  to  compute  the 
practical  appropriateness  indices  were  computed  using  the  test  norming 
sample.  These  quantities  were  computed  as  the  first  step  in  the  analysis 
and  then  used  in  all  subsequent  analyses.  LOGIST  (Wood,  Wingersky,  &  Lord, 
1976)  was  used  to  estimate  three-parameter  logistic  item  parameters  and  a 
Fortran  program  was  written  to  compute  the  other  quantities  required. 

The  practical  appropriateness  indices  and  LR^  were  then  computed  for 
the  4000  response  vectors  in  the  normal  sample.  The  item  and  test 
statistics  estimated  from  the  test  norming  sample  were  used  in  these 
calculations.  This  procedure  simulates  the  process  by  which  practical 
appropriateness  indices  would  be  computed  in  many  applications.  Optimal 
indices  were  also  computed  for  the  normal  sample  for  four  aberrant 
conditions:  17$  spuriously  high,  33$  spuriously  high,  17$  spuriously  low, 
and  33$  spuriously  low.  The  ICCs  and  COCCs  used  to  generated  the  data  were 
used  to  compute  LR^  and  LR3  . 

The  practical  appropriateness  indices  were  computed  for  each  of  the  16 
aberrant  samples.  In  addition,  the  17$  spuriously  high  optimal  index  was 
computed  for  the  four  samples  with  this  form  of  aberrance,  the  33$ 
spuriously  high  optimal  index  was  computed  for  the  four  samples  with  this 
form  of  aberrance,  etc.  The  proper  interpretation  of  the  optimal  indices 
computed  in  the  present  research  is  the  following:  They  are  optimal  for  the 
specified  form  of  aberrance,  say  17$  spuriously  high,  in  a  population  where 
the  ability  density  is  a  truncated  normal  for  both  the  normal  and  aberrant 
populations  and  a  response  vector  is  either  normal  or  17$  spuriously  high. 

The  normal  group  does  in  fact  have  this  ability  distribution.  By 
stratifying  on  a  subinterval  of  [-5.0,  3.5]  for  the  aberrant  group,  we 
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determined  the  power  of  the  index  that  is  optimal  for  the  population  as  a 
whole  in  a  particular  subpopulation. 

Evaluation  Criteria.  The  main  criteria  for  evaluating  the 
appropriateness  indices  were  the  proportions  of  aberrant  response  patterns 
correctly  identified  as  aberrant  when  various  proportions  of  norma*  -esponse 
patterns  were  misclassif ied  as  aberrant.  These  proportions  shall  oe 
presented  for  all  16  aberrance  conditions.  This  allows  us  to  determine  unat 
types  of  aberrant  response  patterns  have  acceptably  high  detection  rates 
using  optimal  methods  and  using  practical  methods.  The  characteristics  of 
response  patterns  that  cannot  be  detected  are  evident  as  a  consequence  of 
examining  the  16  aberrance  conditions  separately. 

Results 

The  results  for  the  spuriously  high  conditions  are  given  in  Tables  l 
through  4.  The  results  for  the  lowest  ability  group  are  shown  in  Table  1. 

In'  this  table  it  is  evident  that  cheating  on  five  randomly  selected  items  is 
not  very  detectable:  At  a  21  false  alarm  rate  only  28%  of  the  simulated 
cheaters  are  detected  by  the  optimal  LRp  Index.  The  best  of  the  practical 
indices,  z,  and  F2  ,  detect  18%  and  20t,  respectively.  Cheating  on  10 
items  (the  33%  condition)  is^  reasonably  detectable.  For  example,  LRp 
detects  6 1  %  and  LR,  detects  5*1$  at  a  2%  false  alarm  rate.  At  this  false 
alarm  rate,  z,  ,  F2  ,  and  T*1  detect  M4%,  41%,  and  50%,  respectively. 
Finally,  detection  rates  for  optimal  indices  computed  from  true  and 
estimated  ICCs  are  very  similar  for  almost  all  false  alarm  rates  in  Table  1. 


Insert  Tables  1  through  4  about  here 
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provided  by  testa  by  writing  items  with  highly  informative  incorrect 
options. 

An  appropriateness  measurement  simulation  study  was  also  conducted  to 
compare  the  polychotomous  model  with  a  dichotomous  submodel,  namely  the 
tnree-parameter  logistic.  Several  important  results  were  obtained.  First, 
for  tne  spuriously  low  treatment  that  simulates  atypical  educations, 
misgri Jding  answers  to  a  portion  of  the  test,  unusual  creativity,  etc.,  we 
fc  and  that  optimal  three-parameter  logistic  appropriateness  indices  fell  far 
snort  ; f  tr.ei"  optimal  polychotomous  model  counterparts.  At  some  false 
i.irm  rates,  tne  -ates  of  detection  of  aberrant  response  vectors  were  more 
tnan  i 001  rugner  for  tne  polychotomous  optimal  Indices.  Tnus 
apprupr i ateness  measurement  constitutes  one  important  practical  testing 
profc.em  where  substantial  gains  are  made  by  the  use  of  a  polychotomous  item 


-esponse  model. 

Tne  results  of  the  appropriateness  measurement  simulation  study  also 
mowed  tnat  tne  practical  polychotomous  model  index  was  not  a 

i  art i cui arl y  good  index:  Its  detection  rates  were  not  close  to  optimal  for 
either  spuriously  high  or  spuriously  low  treatments.  This  result,  in 


.hjunoti  m  with  the  results  described,  previously  point  to  the  need  to 


wise  setter  polychotomous  appropri ateness  indices  that  can  be  used  in 


P '■  i"  t  i  a .  situations. 


A  f mrd  result  obtained  in  the  appropriateness  measurement  research  is 
that  the  .  ,  ,  F.  ,  ind  Tu  indices  effectively  detect  sberrance  in 

»  it  .  >n  to  tnree-pa’"  imeter  logistic  optimal  indices  (but  not  polychotomous 
m->de.  optimal  indices:.  Therefore,  if  one  Is  satisfied  with  dichotomous 
scoring  >f  item  responses  for  some  particular  application,  then  z,  ,  F<?  , 


:an  he  used  with  confidence  to  detect  inappropriate  test  scores. 


Means  for  implementing  appropriateness  measurement  in  practical  settings  are 
discussed  by  Drasgow  and  Guertler  (1987). 

Finally,  the  LR,'  Indices  provided  detection  rates  that  were  nearly  as 
high  as  the  rates  provided  by  the  optimal  LR,  indices.  Thus,  the  three- 
parameter  logistic  optimal  indices  seem  to  be  robust  to  item  parameter 
estimation  error.  This  result  is  surprising  because  extensive  computations 
are  required  to  evaluate  LR,';  small  errors  (in  ICC  values)  would  be 
expected  to  grow  progressively  larger  as  the  computations  progressed. 
Nonetheless,  only  small  differences  between  values  of  LR,  and  LR,'  were 
observed  for  individual  response  patterns.  Thus,  we  are  encouraged  to 
continue  research  on  "almost-optlmal”  indices  that  are  based  on  likelihood 
ratios  and  could  be  used  in  practical  settings. 

Conclusion 

COCC  estimation  provides  opportunities  to  improve  testing  in  a  variety 
of  ways:  ability  estimation,  the  theory  and  practice  of  item  writing, 
appropriateness  measurement.  Applications  in  areas  such  as  item  and  test 
bias  and  adaptive  testing  may  also  be  fruitful.  Consequently,  we  conclude 
that  there  is  Information  in  incorrect  responses  and  that  polychotomous  item 
response  models  can  make  important  contributions  to  psychological  testing. 
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Table  1 

Selected  ROC  Points  for  Spuriously  High 
Response  Patterns  Generated  from  the  0-9%  Ability  Range 


False  Proportion  Detected  by 

Alarm 


Rate 

LR 
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LR, 
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Zl 

FI 
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T2 

TH 
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17%  Spuriously  High  Treatment 
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OH 

OH 
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00 

03 

00 
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00 

00 

01 

00 

00 
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1 1 
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1 1 
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00 

08 

00 

OH 

OH 
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02 
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16 

19 

17 
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12 
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13 

03 
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03 

OH 
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28 

29 

26 

08 

18 

OH 

20 

12 

13 

1 1 
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07 

.03 

3H 

33 

30 

1 1 

25 

07 

2H 

18 

18 

1  H 

09 

09 

.on 

38 

37 

3H 

13 

29 

10 

28 

2H 

22 

18 

13 

12 

.05 

H3 
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38 

15 

33 

15 

32 

27 

26 

22 

15 

1  H 
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H5 

HH 

19 

HI 
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32 

26 

22 

19 
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52 

50 

H9 

26 

51 

36 

50 

H9 
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33 

29 

25 

33%  Spuriously  High  Treatment 

.001 

23 
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17 

02 

10 

00 
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00 

06 

12 

00 

00 
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HO 
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07 
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00 

15 

00 

28 

27 

00 

OH 
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H5 

H5 

H3 

12 

30 

01 

27 

06 

37 

3H 

00 

09 
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61 
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52 

17 

HH 

05 
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17 

50 

H6 

01 

17 

.03 

67 

59 

58 

22 

50 

16 

H7 

2H 

60 

52 

02 

2H 

.OH 

71 
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63 

25 

56 

23 

55 

32 

65 

57 

03 

37 
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72 

67 

66 

31 

62 

30 

59 

37 

69 

61 

03 

37 
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77 

71 

70 

37 

66 

H2 

68 

H7 

7H 

67 

07 

H7 

.10 

81 

75 

75 

H6 

75 

57 

76 

60 

81 

73 

19 

57 

Table  2 

Selected  ROC  Points  for  Spuriously  High 
Response  Patterns  Generated  from  the  10-30?  Ability  Range 


False 

Alarm 


Proportion  Detected  by 


Rate 

LRP 

LRj 
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P 

Zj 

FI 
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T2 
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01 

00 

00 

02 

00 

00 

00 

00 

01 

00 

00 

.005 

09 

07 

07 

01 

05 

00 

03 

00 

05 

04 

00 

01 

.01 

14 

14 

14 

04 

09 

00 

06 

00 

07 

07 

00 

03 

.02 

26 

25 

22 

06 

14 

01 

11 

04 

14 

12 

01 

05 

.03 

31 

29 

29 

08 

19 

03 

14 

06 

20 

16 

02 

07 

.04 

34 

33 

33 

10 

23 

06 

18 

10 

24 

20 

02 

10 

.05 

40 

36 

37 

12 

27 

09 

21 

12 

27 

23 

03 

14 

.07 

46 

43 

43 

16 

34 

14 

27 

18 

33 

28 

06 

20 

.10 

52 

50 

51 

23 

43 

24 

37 

28 

42 

34 

14 

27 

33?  Spuriously  High  Treatment 

.001 

16 

16 

13 

00 

04 

00 

00 

00 

03 

09 

00 

01 

.005 

31 

27 

23 

03 

14 

00 

07 

00 

20 

23 

00 

06 

.01 

37 

40 

39 

05 

20 

00 

15 

01 

28 

29 

00 

10 

.02 

53 

50 

50 

08 

30 

03 

27 

06 

42 

41 

00 

20 

.03 

61 

56 

57 

12 

37 

08 

34 

10 

51 

47 

00 

27 

.04 

65 

63 

62 

14 

42 

12 

42 

16 

58 

53 

00 

34 

.05 

68 

66 

65 

19 

49 

17 

46 

20 

62 

58 

00 

40 

.07 

73 

70 

70 

25 

54 

28 

56 

29 

67 

63 

05 

51 

.10 

78 

74 

75 

33 

64 

44 

67 

41 

74 

70 

18 

60 

Table  3 

Selected  ROC  Points  for  Spuriously  High 
Response  Patterns  Generated  from  the  31-48%  Ability  Range 


False  Proportion  Detected  by 

Alarm 


Rate 

LR 

P 

LR, 

LR  l 

z 

P 

z3 

FI 

F2 

S 

T2 

T4 

JK 

0/E 

17%  Spuriously  High  Treatment 

.001 

00 

00 

00 

00 

01 

00 

00 

00 

00 

01 

00 

00 

.005 

03 

03 

04 

00 

03 

00 

01 

00 

04 

04 

00 

00 

.01 

06 

07 

08 

02 

06 

00 

02 

00 

06 

06 

00 

01 

.02 

15 

15 

14 

03 

09 

00 

05 

00 

12 

12 

00 

05 

.03 

20 

19 

19 

05 

14 

03 

07 

02 

17 

15 

00 

08 

.04 

24 

23 

24 

06 

17 

06 

10 

03 

21 

18 

00 

10 

.05 

29 

26 

28 

07 

20 

07 

13 

04 

23 

22 

00 

13 

.07 

36 

34 

35 

10 

25 

12 

18 

07 

29 

26 

01 

20 

.10 

43 

42 

43 

15 

33 

18 

26 

12 

36 

32 

07 

29 

33%  Spuriously  High  Treatment 

.001 

06 

10 

07 

00 

02 

00 

00 

00 

02 

06 

00 

01 

.005 

17 

16 

14 

01 

07 

00 

03 

00 

12 

16 

00 

05 

.01 

22 

27 

26 

02 

10 

00 

08 

00 

18 

22 

00 

08 

.02 

39 

36 

37 

04 

17 

04 

16 

02 

27 

32 

00 

17 

.03 

48 

43 

45 

05 

22 

08 

21 

05 

36 

38 

00 

23 

.04 

53 

51 

49 

07 

27 

12 

27 

07 

41 

43 

00 

29 

.05 

56 

55 

54 

09 

33 

16 

31 

09 

45 

47 

00 

34 

.07 

63 

61 

61 

13 

37 

23 

40 

14 

50 

53 

07 

44 

.10 

71 

67 

68 

20 

46 

36 

51 

22 

59 

60 

19 

53 

Table  4 

Selected  ROC  Points  for  Spuriously  High 
Response  Patterns  Generated  from  the  49-64$  Ability  Range 


False  Proportion  Detected  by 


Alarm 


' 

•  -i 

i  ■  V. 

Rate 

LR 

P 

lr3 

lr; 

z 

P 

z, 

FI 

F2 

S 

T2 

T4 

JK 

0/E 

Y 

17$ 

Spuriously  High  Treatment 

.001 

00 

00 

00 

00 

00 

00 

00 

00 

00 

00 

00 

00 

.005 

00 

00 

00 

00 

01 

00 

00 

00 

02 

03 

00 

00 

.01 

02 

01 

01 

00 

03 

01 

01 

00 

04 

04 

00 

00 

t 

.02 

07 

06 

03 

01 

05 

01 

03 

00 

07 

08 

00 

00 

LM 

.03 

1 1 

09 

05 

01 

08 

04 

04 

00 

1 1 

1 1 

00 

06 

v/j 

.04 

14 

13 

07 

02 

10 

06 

07 

01 

14 

14 

00 

09 

'ft 

t* 

.05 

18 

1 6 

09 

03 

13 

08 

08 

01 

16 

17 

00 

12 

.07 

25 

23 

14 

06 

17 

11 

13 

03 

20 

21 

01 

17 

.10 

33 

30 

23 

09 

23 

16 

19 

05 

26 

27 

07 

24 

33$ 

Spuriously  High  Treatment 

.001 

01 

02 

00 

00 

00 

00 

00 

00 

00 

02 

00 

00 

t »:» 

tf.S 

.005 

05 

04 

02 

03 

03 

00 

01 

00 

05 

07 

00 

01 

i  i] 

.01 

08 

10 

05 

00 

04 

02 

04 

00 

07 

10 

00 

02 

•v/ 

4  Vi 

.02 

19 

16 

09 

01 

07 

07 

08 

01 

12 

17 

00 

06 

'.»a 

v,f« 

.03 

28 

23 

14 

03 

10 

1 1 

1 1 

02 

16 

20 

00 

08 

yA 

1 

,»Vj 

.04 

34 

32 

18 

03 

12 

14 

15 

03 

20 

25 

00 

1 1 

E 

.05 

37 

37 

21 

05 

16 

17 

17 

04 

23 

29 

00 

14 

t  ", », 

j 

.07 

48 

45 

29 

08 

19 

23 

23 

07 

28 

35 

03 

20 

/»'  , 
Y 

S'. 

.10 

60 

55 

41 

13 

25 

31 

31 

12 

35 

40 

11 

28 

Table  5 

Selected  ROC  Points  for  Spuriously  Low 
Response  Patterns  Generated  from  the  31 “48%  Ability  Range 


False 

Alarm 


Proportion  Detected  by 


Rate 

LR 

P 

LR, 

LRJ 

z 

P 

z3 

FI 

F2 

S 

T2 

T4 

JK 

0/E 

17 %  Spuriously  Low 

Treatment 

.001 

01 

00 

00 

00 

00 

00 

00 

00 

00 

00 

00 

00 

.005 

05 

01 

01 

03 

02 

00 

01 

00 

02 

02 

00 

00 

.01 

09 

03 

03 

05 

04 

01 

02 

00 

03 

03 

01 

01 

.02 

15 

06 

07 

08 

07 

02 

04 

00 

06 

07 

01 

02 

.03 

18 

10 

12 

12 

10 

04 

05 

01 

09 

09 

02 

03 

.on 

21 

14 

15 

14 

13 

07 

07 

03 

12 

12 

03 

05 

.05 

24 

17 

18 

15 

15 

10 

09 

04 

14 

14 

05 

07 

.07 

29 

22 

23 

21 

19 

17 

12 

07 

18 

17 

07 

10 

.10 

35 

28 

28 

27 

26 

25 

17 

11 

23 

22 

12 

14 

33%  Spuriously  Low 

Treatment 

.001 

07 

01 

01 

01 

02 

00 

00 

00 

00 

01 

00 

00 

.005 

14 

03 

04 

07 

05 

00 

04 

00 

03 

04 

01 

01 

.01 

22 

08 

09 

12 

10 

02 

07 

00 

05 

07 

02 

01 

.02 

30 

14 

16 

18 

15 

05 

11 

03 

09 

11 

04 

03 

.03 

36 

20 

22 

23 

20 

09 

13 

06 

14 

15 

07 

04 

.04 

41 

24 

26 

27 

23 

13 

17 

10 

16 

19 

10 

06 

.05 

45 

29 

30 

31 

26 

17 

19 

1 1 

19 

22 

13 

07 

.07 

51 

36 

37 

36 

32 

27 

24 

17 

22 

27 

17 

1 1 

.10 

59 

44 

44 

44 

38 

36 

31 

25 

29 

33 

24 

16 

i 


Table  6 

Selected  ROC  Points  for  Spuriously  Low 
Response  Patterns  Generated  from  the  *49-64?  Ability  Range 


False  Proportion  Detected  by 

Alarm 


Rate 

LR 

P 

lr3 

LR  j 

z 

P 

z3 

FI 

F2 

S 

T2 

T4 

JK 

0/E 

17? 

Spuriously  Low 

Treatment 

.001 

13 

02 

02 

00 

02 

00 

00 

00 

00 

01 

00 

00 

.005 

22 

09 

08 

04 

04 

00 

01 

00 

05 

05 

00 

00 

.01 

26 

14 

13 

07 

09 

03 

03 

00 

08 

07 

00 

01 

.02 

32 

21 

20 

1 1 

14 

09 

07 

01 

13 

13 

00 

04 

.03 

34 

26 

25 

16 

19 

17 

10 

03 

19 

16 

00 

06 

.0*4 

38 

30 

28 

19 

22 

21 

13 

04 

22 

20 

00 

08 

.05 

41 

33 

31 

23 

25 

24 

15 

05 

26 

22 

00 

1 1 

.07 

46 

37 

35 

29 

31 

31 

20 

08 

29 

27 

03 

16 

.10 

51 

42 

40 

37 

38 

34 

28 

13 

36 

32 

09 

21 

33% 

Spuriously  Low 

Treatment 

.001 

24 

09 

08 

02 

08 

00 

00 

00 

03 

05 

00 

00 

.005 

34 

16 

15 

15 

14 

00 

06 

00 

14 

12 

00 

02 

.01 

43 

25 

24 

23 

22 

03 

1 1 

01 

18 

17 

00 

03 

.02 

51 

33 

32 

31 

29 

12 

17 

04 

26 

25 

00 

08 

•  03 

55 

39 

38 

37 

36 

22 

22 

07 

33 

29 

01 

12 

.0*4 

58 

43 

41 

41 

39 

29 

26 

10 

38 

34 

01 

16 

.05 

61 

46 

45 

45 

43 

36 

29 

13 

41 

38 

02 

19 

.07 

66 

51 

50 

52 

49 

44 

36 

19 

45 

44 

06 

25 

.10 

71 

58 

56 

60 

57 

52 

45 

27 

53 

51 

14 

33 

Table  7 

Selected  ROC  Points  for  Spuriously  Low 
Response  Patterns  Generated  from  the  65-921  Ability  Range 


False 

Alarm 

Rate 

Proportion 

Detected  by 

LR 

P 

LR  j 

lr; 

z 

P 

z. 

FI 

F2 

s 

T  2 

T4 

JK 

w  -  £ 

17$  Spuriously  Low 

Treatment 

.001 

35 

15 

14 

01 

05 

01 

00 

00 

01 

0  3 

0'J 

-  - 

.005 

45 

33 

31 

09 

1 1 

1  1 

05 

01 

1 1 

09 

00 

00 

.01 

51 

41 

38 

15 

19 

24 

1  1 

05 

15 

1  4 

00 

00 

.02 

57 

46 

45 

20 

27 

40 

20 

13 

25 

21 

00 

02 

.03 

60 

50 

49 

25 

34 

45 

24 

1  ) 

32 

26 

00 

J3 

.04 

63 

53 

51 

31 

38 

53 

30 

24 

36 

30 

00 

04 

.05 

65 

55 

53 

35 

42 

57 

34 

27 

39 

35 

00 

06 

.07 

68 

59 

57 

42 

50 

61 

41 

33 

43 

40 

04 

10 

.10 

71 

63 

61 

51 

5" 

65 

50 

41 

53 

48 

12 

10 

33$  Spuriously  Low 

Treatment 

.001 

53 

31 

29 

04 

26 

00 

03 

00 

15 

22 

00 

02 

.005 

61 

44 

41 

25 

39 

05 

21 

02 

40 

39 

00 

09 

.01 

67 

53 

51 

34 

52 

20 

34 

09 

47 

46 

00 

15 

.02 

72 

59 

57 

45 

61 

42 

47 

21 

59 

57 

00 

25 

.03 

75 

63 

61 

52 

67 

54 

54 

29 

67 

51 

00 

30 

.04 

77 

66 

63 

57 

70 

60 

61 

36 

71 

67 

00 

38 

.05 

78 

68 

66 

61 

74 

64 

64 

40 

74 

70 

00 

43 

.07 

81 

72 

70 

69 

79 

71 

71 

49 

79 

75 

15 

52 

.10 

84 

75 

74 

77 

84 

79 

79 

58 

84 

80 

33 

61 

Table  6 

Selected  ROC  Joints  for  Spurious. y  l^jw 
Response  Patterns  Oeneratea  f^om  tr.e  •*;  ’  JUi  Ati.ity  Rang*. 


False 

Ai  arm 

pr  up  or 

t  ion 

Oe tec tea 

t>y 

Rate 

LR 

P 

'  © 

“  ‘  i 

jr  ; 

2^ 

P 

2  . 

F- 

f  £ 

2 

T  J 

y  «4 

JK 

> 

' 7%  Spur icuSi 

y  Ljw 

Treatment 

.  DC  ’ 

•*5 

22 

22 

04 

34 

'  1 

OC 

02 

jj 

3C 

jC 

.305 

55 

4  ~ 

4  v 

1  3 

’  1 

77 

09 

09 

•  L. 

'  i 

OC 

00 

.0’ 

60 

49 

u6 

*  6 

20 

•*3 

'  6 

2  c 

2 1 

16 

00 

OC 

.02 

t7 

5« 

53 

26 

29 

55 

30 

35 

33 

29 

00 

00 

.03 

c9 

56 

56 

32 

37 

60 

35 

4  1 

4' 

00 

00 

.0- 

T  4 

60 

53 

37 

41 

63 

41 

48 

47 

4  1 

00 

01 

.05 

72 

62 

6G 

40 

46 

66 

45 

5* 

Cti 

k"7 

00 

O’ 

.07 

74 

65 

62 

48 

54 

71 

53 

58 

56 

53 

02 

03 

.10 

77 

68 

t>6 

58 

63 

75 

63 

65 

64 

62 

1 1 

06 

1  33*  Spuriously  Low 

Treatment 

.  00’ 

64 

42 

40 

04 

32 

02 

06 

00 

20 

33 

00 

02 

.005 

73 

55 

C.  4 

27 

49 

17 

32 

06 

51 

52 

00 

08 

.01 

76 

61 

59 

39 

62 

36 

4t 

21 

59 

60 

00 

’  3 

.02 

61 

67 

64 

51 

71 

59 

61 

39 

69 

70 

00 

.03 

83 

70 

68 

59 

77 

69 

67 

48 

74 

T4 

00 

21 

.OR 

M  d 

72 

70 

64 

79 

74 

72 

55 

80 

78 

00 

33 

.35 

66 

74 

73 

68 

82 

78 

75 

59 

63 

61 

00 

38 

.07 

87 

77 

75 

75 

86 

84 

oO 

68 

36 

84 

21 

48 

!  .10 

90 

~"9 

77 

82 

90 

87 

86 

76 

90 

87 

41 

57 
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